Automatic word acquisition from continuous speech

نویسندگان

  • Helmut Lucke
  • Masanori Omote
چکیده

A method for learning lexical representations of unknown words in an unsupervised manner is described. The unknown words are automatically extracted from continuous speech and a clustering algorithm is used to derive word clusters and lexical representations based on the set of phonetic units used in the system. In experiments, we verify the robustness of the approach. An interesting feature is that extraction errors usually do no harm, as wrongly extracted words tend to inhabit clusters by themselves and thus do not adversely e ect the modeling of correctly extracted words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Interactivity in Human-Machine Conversation for Automatic Word Acquisition

Motivated by the psycholinguistic finding that human eye gaze is tightly linked to speech production, previous work has applied naturally occurring eye gaze for automatic vocabulary acquisition. However, unlike in the typical settings for psycholinguistic studies, eye gaze can serve different functions in human-machine conversation. Some gaze streams do not link to the content of the spoken utt...

متن کامل

Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition

 In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...

متن کامل

Learning Word-Like Units from Joint Audio-Visual Analysis

Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions. For example, our model is able to detect spoken instances of the words “lighthouse” within an utterance and associate them with image regions containing lighthouses. We do not use any form ...

متن کامل

Word Order Acquisition in Persian Speaking Children

Objectives: Persian is a pro-drop language with canonical Subject-Object-Verb (SOV) word order. This study investigates the acquisition of word order in Persian-speaking children. Methods: In the present study, participants were 60 Persian-speaking children (30 girls and 30 boys) with typically developing language skills, and aged between 30-47 months. The 30-minute language samples were audio...

متن کامل

Rapid Statistical Learning Supporting Word Extraction From Continuous Speech.

The identification of words in continuous speech, known as speech segmentation, is a critical early step in language acquisition. This process is partially supported by statistical learning, the ability to extract patterns from the environment. Given that speech segmentation represents a potential bottleneck for language acquisition, patterns in speech may be extracted very rapidly, without ext...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001